AI Engineering Architecture
The simplest architecture
In its simplest form:
- an application receives a query
- sends it to the model API
- model generates a response
- response is returned to the user
More components can be added
1. Enhance context
Placed between query and model API: Enhance context input into a model by giving the model access to external data sources and tools
- via Retrieval Augmented Generation (RAG)
- via tools that allow the model to automatically gather information through APIs such as web search
Quote
Context construction is like feature engineering for foundation models.
2. Put in guardrails
Placed at inputs and outputs: Guardrails help protect you and your users
- input guardrails
- protect against two types of risks
- leaking private information to external APIs
- executing bad prompts that compromise your system
- how it works
- e.g. sensitive data is detected by AI tools -> the entire query is blocked or the sensitive information is removed
- protect against two types of risks
- output guardrails
- catch output failures: quality failure / security failure
- specify the policy to handle different failure modes
3. Add model router and gateway
Placed inside the model: they support complex pipelines and add more security
4. Reduce latency and cost with caches
Placed inside the model: typically implemented by model API providers
- major system caching mechanisms:
- exact caching
- semantic caching
5. Add agent patterns
Placed in a loop: the system's output may not be enough to accomplish the task so it starts another cycle (similar to Intelligent Agent)